Goto

Collaborating Authors

 gemini 3


Gemini 3 is now Google's default model for AI Overviews

Engadget

Apple could unveil Gemini-powered Siri in Feb. Gemini 3 is now Google's default model for AI Overviews Plus, you can start an AI Mode conversation directly from a summary. The Google logo and lettering can be seen on the façade of the company's Munich headquarters building in Munich (Bavaria). Google has begun rolling out two upgrades for Search. Starting today, Gemini 3 is the default model powering AI Overviews. When the company debuted its new family of AI systems last November, it first deployed Gemini 3 in AI Overviews through a router that was programmed to direct the most difficult questions to the new system.


Why the World's Best AI Systems Are Still So Bad at Pokémon

TIME - Tech

Why the World's Best AI Systems Are Still So Bad at Pokémon Pillay is an editorial fellow at TIME. Pillay is an editorial fellow at TIME. Right now, live on Twitch, you can watch three of the world's smartest AI systems-- GPT 5.2, Claude Opus 4.5, and Gemini 3 Pro --doing their best to beat classic Pokémon games. At least by human standards, they are not very good. The systems are slow, overconfident, and often confused.


Google's new default AI model: Gemini 3 Flash is faster and stronger

PCWorld

Google launched Gemini 3 Flash as its new default AI model, offering up to three times faster performance than Gemini 2.5 Flash while being more cost-effective. PCWorld reports the model excels in multimodal tasks, scoring 81.2% in MMMU-Pro benchmarks and performing comparably to Gemini 3 Pro and OpenAI's GPT-5.2. This upgrade enhances Google's AI products with improved visual understanding, making advanced AI capabilities more accessible for everyday workflows and data analysis. Google has now launched Gemini 3 Flash, a faster and more cost-effective AI model based on Gemini 3. According to Google, Gemini 3 Flash is up to three times faster than Gemini 2.5 Flash, and it outperforms previous Flash models in all internal tests. In several benchmark tests, Gemini 3 Flash performed on par with both Gemini 3 Pro and OpenAI's GPT-5.2. In the multimodal test MMMU-Pro, it even topped the list with a result of 81.2 percent. The Flash model is supposed to be adapted for fast and repetitive workflows.


Google's Gemini 3 Flash model outperforms GPT-5.2 in some benchmarks

Engadget

Google's Gemini 3 Flash model outperforms GPT-5.2 in some benchmarks Gemini 3 Flash is now rolling out to the Gemini app and AI Mode in Search. Almost exactly a month after the debut of Gemini 3 Pro in November, Google has begun rolling out the more efficient Flash version of its latest AI model. According to the company, the new system offers similar pro-grade reasoning performance as its flagship model at a fraction of the cost, making it ideal for everyday use. In benchmarks, the new system performed significantly better than Google's previous generation models, including Gemini 2.5 Pro. More notably, in Google's testing it managed to trade blows with GPT-5.2, the model OpenAI rushed out to counter Gemini 3 Pro.


YouTube is letting creators make playable games with a Gemini 3 tool

Engadget

Don't expect the next Clair Obscur, though. Google's at it again, once more insisting that AI is something people need or want more of in their lives. The latest move comes from YouTube Gaming, which announced an open beta for a project called Playables Builder. This allows select YouTube Creators to use a prototype web app built using Gemini 3 to make bite-sized games, no coding required. YouTube is launching a closed Beta test for Playables Builder, a prototype web app built using Gemini 3 where users create games with short text, video or image prompts. YouTube was testing the addition of small-scale games to its desktop and mobile platforms back in 2023, then added multiplayer capability to Playables last year.


Aluminium OS: Everything We Know About the Chromebook Successor

WIRED

Google's Chromebook Successor Is Coming. Here's Everything We Know So Far Google has officially acknowledged the upcoming merger of Android and Chromebooks, and it may be coming in 2026. It's never fun to be in last place. Google has been coasting along with its Android tablets and Chromebooks for years, playing second fiddle to the bigger players in the game. But the company has a new card up its sleeve: the upcoming merger of its two platforms into something entirely new.


OpenAI releases GPT-5.2 to take on Google and Anthropic

Engadget

OpenAI releases GPT-5.2 to take on Google and Anthropic The new model is all about professional work. OpenAI's code red response to Google's Gemini 3 Pro has arrived . On the same day the company announced a Sora licensing pact with Disney, it took the wraps off GPT-5.2 . OpenAI is touting the new model as its best yet for real-world, professional use. "It's better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects," said OpenAI.


The AI Consumer Index (ACE)

Benchek, Julien, Shetty, Rohit, Hunsberger, Benjamin, Arun, Ajay, Richards, Zach, Foody, Brendan, Nitski, Osvald, Vidgen, Bertie

arXiv.org Artificial Intelligence

We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform everyday consumer tasks. ACE contains a hidden heldout set of 400 test cases, split across four consumer activities: shopping, food, gaming, and DIY. We are also open sourcing 80 cases as a devset with a CC-BY license. For the ACE leaderboard we evaluated 10 frontier models (with websearch turned on) using a novel grading methodology that dynamically checks whether relevant parts of the response are grounded in the retrieved web sources. GPT 5 (Thinking = High) is the top-performing model, scoring 56.1%, followed by o3 Pro (Thinking = On) at 55.2% and GPT 5.1 (Thinking = High) at 55.1%. Model scores differ across domains, and in Shopping the top model scores under 50\%. We find that models are prone to hallucinating key information, such as prices. ACE shows a substantial gap between the performance of even the best models and consumers' AI needs.


Reasoning Models Ace the CFA Exams

Patel, Jaisal, Chen, Yunzhe, He, Kaiwen, Wang, Keyi, Li, David, Xiao, Kairong, Liu, Xiao-Yang

arXiv.org Artificial Intelligence

Previous research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved strong results on graduate-level academic and professional examinations across various disciplines. In this paper, we evaluate state-of-the-art reasoning models on a set of mock CFA exams consisting of 980 questions across three Level I exams, two Level II exams, and three Level III exams. Using the same pass/fail criteria from prior studies, we find that most models clear all three levels. The models that pass, ordered by overall performance, are Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1. Specifically, Gemini 3.0 Pro achieves a record score of 97.6% on Level I. Performance is also strong on Level II, led by GPT-5 at 94.3%. On Level III, Gemini 2.5 Pro attains the highest score with 86.4% on multiple-choice questions while Gemini 3.0 Pro achieves 92.0% on constructed-response questions.


OpenAI Is in Trouble

The Atlantic - Technology

The start-up is falling behind in the AI race. For nearly three years, Marc Benioff, the CEO of Salesforce, was a ChatGPT devotee. Then, late last month, he abruptly converted to Google's chatbot, Gemini. "Holy shit," he wrote on X. "I've used ChatGPT every day for 3 years. Just spent 2 hours on Gemini 3. I'm not going back. When Gemini 3 was released in mid-November, it appeared to crush OpenAI's top model on a suite of evaluations shared by Google. The bot has since received widespread praise from the tech industry. One analyst said that Gemini 3 is " the best model ever .